Multi-classification of Patent Applications with Winnow

نویسندگان

  • Cornelis H. A. Koster
  • Marc Seutter
  • Jean Beney
چکیده

The Winnow family of learning algorithms can cope well with large numbers of features and is tolerant to variations in document length, which makes it suitable for classifying large collections of large documents, like patent applications. Both the large size of the documents and the large number of available training documents for each class make this classification task qualitatively different from the classification of short documents (newspaper articles or medical abstracts) with few training examples, as exemplified by the TREC evaluations. This note describes recent experiments with Winnow on two large corpora of patent applications, supplied by the European Patent Office (EPO). It is found that the multi-classification of patent applications is much less accurate than the mono-classification of similar documents. We describe a potential pitfall in multi-classification and show ways to improve the accuracy. We argue that the inherently larger noisiness of multi-class labeling is the reason that multi-classification is harder than mono-classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Categorization for Intellectual Property Comparing Balanced Winnow with SVM on Different Document Representations

This study investigates the effect of training different categorization algorithms on various patent document representations. The automation of knowledge and content management in the intellectual property domain has been experiencing a growing interest in the last decade, since the first patent classification system was presented in 1999 by Larkey [Larkey, 1999]. Typical applications of paten...

متن کامل

Comparative Analysis of Balanced Winnow and SVM in Large Scale Patent Categorization

This study investigates the effect of training different categorization algorithms on a corpus that is significantly larger than those reported in experiments in the literature. By means of machine learning techniques, a collection of 1.2 million patent applications is used to build a classifier that is able to classify documents with varyingly large feature spaces into the International Classi...

متن کامل

myClass: A Mature Tool for Patent Classification

In this task 2,000 patents in three languages (English, French and German) were to be classified among approximately 600 categories. We used a classifier based on neural networks of the Winnow type. This classifier is already used for similar tasks in professional applications. We tested three different approaches to improve the classification accuracy: the first one aimed at solving the issue ...

متن کامل

Hand Gestures Classification with Multi-Core DTW

Classifications of several gesture types are very helpful in several applications. This paper tries to address fast classifications of hand gestures using DTW over multi-core simple processors. We presented a methodology to distribute templates over multi-cores and then allow parallel execution of the classification. The results were presented to voting algorithm in which the majority vote was ...

متن کامل

Text Chunking based on a Generalization of Winnow

This paper describes a text chunking system based on a generalization of the Winnow algorithm. We propose a general statistical model for text chunking which we then convert into a classification problem. We argue that the Winnow family of algorithms is particularly suitable for solving classification problems arising from NLP applications, due to their robustness to irrelevant features. Howeve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003